Attention-Passing Models for Robust and Data-Efficient End-to-End Speech Translation
نویسندگان
چکیده
منابع مشابه
Joint CTC/attention decoding for end-to-end speech recognition
End-to-end automatic speech recognition (ASR) has become a popular alternative to conventional DNN/HMM systems because it avoids the need for linguistic resources such as pronunciation dictionary, tokenization, and contextdependency trees, leading to a greatly simplified model-building process. There are two major types of end-to-end architectures for ASR: attention-based methods use an attenti...
متن کاملLocal Monotonic Attention Mechanism for End-to-End Speech Recognition
Recently, encoder-decoder neural networks have shown impressive performance on many sequence-related tasks. The architecture commonly uses an attentional mechanism which allows the model to learn alignments between the source and the target sequence. Most attentional mechanisms used today is based on a global attention property which requires a computation of a weighted summarization of the who...
متن کاملEnd-to-End Speech Recognition Models
For the past few decades, the bane of Automatic Speech Recognition (ASR) systems have been phonemes and Hidden Markov Models (HMMs). HMMs assume conditional independence between observations, and the reliance on explicit phonetic representations requires expensive handcrafted pronunciation dictionaries. Learning is often via detached proxy problems, and there especially exists a disconnect betw...
متن کاملAttention-Based End-to-End Speech Recognition on Voice Search
Recently, there has been an increasing interest in end-to-end speech recognition that directly transcribes speech to text without any predefined alignments. In this paper, we explore the use of attention-based encoder-decoder model for Mandarin speech recognition and to the best of our knowledge, achieve the first promising result. We reduce the source sequence length by skipping frames and reg...
متن کاملEnd-to-End Automatic Speech Translation of Audiobooks
We investigate end-to-end speech-to-text translation on a corpus of audiobooks specifically augmented for this task. Previous works investigated the extreme case where source language transcription is not available during learning nor decoding, but we also study a midway case where source language transcription is available at training time only. In this case, a single model is trained to decod...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Transactions of the Association for Computational Linguistics
سال: 2019
ISSN: 2307-387X
DOI: 10.1162/tacl_a_00270